Designing Fault tolerant Mission-Critical Middleware Infrastructure for Distributed Real-time and Embedded Systems?

نویسندگان

  • Matthew Gillen
  • Paul Rubel
  • Jaiganesh Balasubramanian
  • Aaron Paulos
  • Joseph Loyall
  • Aniruddha Gokhale
  • Priya Narasimhan
  • Richard Schantz
چکیده

Fault tolerance is a crucial design consideration for missioncritical distributed real-time and embedded (DRE) systems, such as avionics mission computing systems, and supervisory control and data acquisition systems. Increasingly more of these systems are created using emerging middleware standards, such as publish-subscribe communication services and component based architectures. Most previous R&D efforts in fault tolerance middleware has focused on client-server object systems. Application of this research to concrete domains frequently requires specialization, hand-tailoring, and customization to accommodate for the real world challenges of these systems, including nondeterminism, scale, and interaction patterns. This paper describes our current applied R&D efforts to develop fault tolerance technology for a specific piece of mission critical DRE infrastructure, a dynamic resource manager, built using CORBA components and exhibiting characteristics representative of real-world DRE systems. This paper makes three contributions to the design and implementation of fault tolerant support in DRE system. First, we describe the fault tolerance challenges presented by these systems, including support for component infrastructure, mixed mode FT techniques (supporting active and passive fault tolerance), support for nondeterminism, issues of scale (limiting the spread of the FT infrastructure to those elements that require it), and the need for predictable and bounded recovery times. Second, we describe the design of our fault tolerant DRE architecture. Finally, we illustrate the fault recovery times in our FT DRE infrastructure in the presence of faults for a representative domain, a mission critical computing environment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adding Fault-Tolerance to a Hierarchical DRE System

Dynamic resource management is a crucial part of the infrastructure for emerging mission-critical distributed real-time embedded system. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes an ongoing effort to develop a fault-tolerant multi-layer dynamic resource management capability and the challenges we have encountered, includ...

متن کامل

Towards Middleware for Fault-Tolerance in Distributed Real-Time and Embedded Systems

Distributed real-time and embedded (DRE) systems often require support for multiple simultaneous quality of service (QoS) properties, such as real-timeliness and fault tolerance, that operate within resource constrained environments. These resource constraints motivate the need for a lightweight middleware infrastructure, while the need for simultaneous QoS properties require the middleware to ...

متن کامل

Applying Patterns to Improve the Performance of Fault Tolerant CORBA

An increasing number of mission-critical, embedded, telecommunications, and financial distributed systems are being developed using distributed object computing middleware, such as CORBA. Applications for these systems often require the underlying middleware, operating systems, and networks to provide end-to-end quality of service (QoS) support to enhance their efficiency, predictability, scala...

متن کامل

Fault Tolerance in a Multi-Layered DRE System: A Case Study

Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications operating and allocating the resources necessary for them to meet their requirements. Because of this, the resource manager must be fault-tolerant, with nearly continuous operation. This paper describes our efforts to deve...

متن کامل

Real-time CORBA on MICO-MT – Design, Implementation, Performance and Application

Mission critical systems like avionics, process control, telecommunication infrastructure etc with distributed heterogeneous environment demand the underlying middleware, OS and networks for interfaces to enhance the predictability, dependability and scalability of the system. The Object Management Group (OMG) has addressed middleware level realtime and fault tolerance issues in Real-Time CORBA...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006